Skip to content

HIVE-28930: Implement a metastore service that expires iceberg table snapshots periodically#5786

Merged
abstractdog merged 8 commits into
apache:masterfrom
abstractdog:HIVE-28930
Jun 8, 2025
Merged

HIVE-28930: Implement a metastore service that expires iceberg table snapshots periodically#5786
abstractdog merged 8 commits into
apache:masterfrom
abstractdog:HIVE-28930

Conversation

@abstractdog

@abstractdog abstractdog commented Apr 25, 2025

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This patch introduces a metastore task as a MetastoreTaskThread that can expire snapshots of iceberg tables periodically according to configuration: catalog name, database pattern, table pattern. The configuration was inspired by the partition management task.

Patch contents:

  1. IcebergHouseKeeperService + TestIcebergHouseKeeperService unit test
  2. added the new task class (ICEBERG_TABLE_SNAPSHOT_EXPIRY_SERVICE_CLASS ) to the default housekeeping threads
  3. MiniHS2 changes: withHouseKeepingThreads (for manual testing)
  4. changing to keepJdbcUri=true in a call, otherwise in remote metastore mode, 2 different derby databases are used, leading to exotic problems
  5. Generalized TableFetcher, which is a basically a table filter builder, originally in the PartitionManagementTask, completely reused + TestTableFetcher unit test

Why are the changes needed?

This service could act as a convenient helper to maintain iceberg tables, which otherwise need explicit hive ql statements by the user.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit tests added.

Manual testing is also possible, as the patch adds MiniHS2 capability and fixes to run metastore tasks in remote mode, example command:

mvn clean install -Dtest=StartMiniHS2Cluster -DminiHS2.clusterType=llap -DminiHS2.conf="target/testconf/llap/hive-site.xml"  -DminiHS2.run=true -DminiHS2.usePortsFromConf=true -pl itests/hive-unit -Pitests -pl itests/util -DminiHS2.clusterType=LOCALFS_ONLY -DminiHS2.isMetastoreRemote=true -DminiHS2.withHouseKeepingThreads=true

@abstractdog

Copy link
Copy Markdown
Contributor Author

@deniskuzZ : this is the reusable, general part of the iceberg table maintenance service (no query history bits can be found here), I would appreciate a review in the future once you have time for that

@deniskuzZ deniskuzZ left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general LGTM, minor comments

@deniskuzZ deniskuzZ left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, pending tests

@sonarqubecloud

sonarqubecloud Bot commented Jun 8, 2025

Copy link
Copy Markdown

@abstractdog abstractdog merged commit 88dc983 into apache:master Jun 8, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants